-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix device_id bug for final_state op in multiprocess testcase #41407
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
你的PR提交成功,感谢你对开源项目的贡献! |
pangyoki
changed the title
support final_state op in multiprocess testcase
fix device_id bug for final_state op in multiprocess testcase
Apr 6, 2022
chenwhql
approved these changes
Apr 6, 2022
pangyoki
added a commit
to pangyoki/Paddle
that referenced
this pull request
Apr 6, 2022
…cess testcase (PaddlePaddle#41407) * support final_state in multiprocess * fix no place.device * set device_id in eager_gen
douch
pushed a commit
to douch/Paddle
that referenced
this pull request
Apr 10, 2022
…Paddle#41407) * support final_state in multiprocess * fix no place.device * set device_id in eager_gen
Thunderbrook
pushed a commit
that referenced
this pull request
Apr 22, 2022
* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464) cherry-pick fix compile bug of windows cuda11.5 #41433 * fix bug of missing boost when compile cache.cc (#41449) 【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies * Fix eager try catch (#41438) (#41477) [Cherry-Pick]Fix eager try catch (#41438) * Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475) Cherry-pick PR #41407 * [BugFix] Add error hint for one_hot gpu version (#41335) (#41495) * add one_hot gpu hint * move allow_out_of_range judgement * delete useless unittest * fix bugs of reshape double grad infermeta (#41459) (#41493) * [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341) (#41491) Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> * [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523) Cherry-pick of #41521 * [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509) * Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200) * Add fill_constant_batch_size YAML and UT (#41474) * Switch some dy2st UT to eager mode (#41382) * Sitch some dy2st UT to eager mode * Fix test_lstm and remove test_transformer * Run test_resnet_v2 in old dy mode * Unittest recover (#41431) * update name * update name * fix test * fix fleet bind * update name * update name * fix test * fix gpups wrapper * remove Push/Pull/Load/Save with context in client and wrapper base class * fix * fix * remove some interface * fix * remove * code style * recover * fix * remove code unused * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable * fix * fix * fix * recover * remove unused code * recover unittest * fix * remove * fix * remove code unuseful * remove * fix * recover * remove Co-authored-by: esythan <esythan@126.com> * add ssd sparse table * fix * add cache shuffle * fix * fix * fix * fix * fix * fix * add unit test * fix Co-authored-by: Zhou Wei <1183042833@qq.com> Co-authored-by: Sing_chan <51314274+betterpig@users.noreply.github.com> Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com> Co-authored-by: pangyoki <pangyoki@126.com> Co-authored-by: Siming Dai <908660116@qq.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: Zhang Jun <ewalker@live.cn> Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: esythan <esythan@126.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Others
Describe
问题
在分布式多进程单测
test_eager_dist_api.py
里,如果使用最终态op,在GetDeviceContextByBackend获取设备时报错。原因
在分布式多进程场景下,gpu1的子进程执行时,使用GetCurrentDeviceId获取到的设备是place0,但是预期获得的应该是place1,导致DeviceContextPool没法Get到相应place。
解决方法
新动态图执行kernel前,需要使用SetDeviceId事先指定place.device。